XMLLR for improved speaker adaptation in speech recognition

نویسندگان

  • Daniel Povey
  • Hong-Kwang Jeff Kuo
چکیده

In this paper we describe a novel technique for adaptation of Gaussian means. The technique is related to Maximum Likelihood Linear Regression (MLLR), but we regress not on the mean itself but on a vector associated with each mean. These associated vectors are initialized by an ingenious technique based on eigen decomposition. As the only form of adaptation this technique outperforms MLLR, even with multiple regression classes and Speaker Adaptive Training (SAT). However, when combined with Constrained MLLR (CMLLR) and Vocal Tract Length Normalization (VTLN) the improvements disappear. The combination of two forms of SAT (CMLLR-SAT and MLLR-SAT) which we performed as a baseline is itself a useful result; we describe it more fully in a companion paper. XMLLR is an interesting approach which we hope may have utility in other contexts, for example in speaker identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008